Inverse Reinforcement Learning with Explicit Policy Estimates

نویسندگان

چکیده

Various methods for solving the inverse reinforcement learning (IRL) problem have been developed independently in machine and economics. In particular, method of Maximum Causal Entropy IRL is based on perspective entropy maximization, while related advances field economics instead assume existence unobserved action shocks to explain expert behavior (Nested Fixed Point Algorithm, Conditional Choice Probability method, Nested Pseudo-Likelihood Algorithm). this work, we make previously unknown connections between these from both fields. We achieve by showing that they all belong a class optimization problems, characterized common form objective, associated policy objective gradient. demonstrate key computational algorithmic differences which arise due an approximation optimal soft value function, describe how leads more efficient algorithms. Using insights emerge our study identify various scenarios investigate each method's suitability problems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inverse Reinforcement Learning through Policy Gradient Minimization

Inverse Reinforcement Learning (IRL) deals with the problem of recovering the reward function optimized by an expert given a set of demonstrations of the expert’s policy. Most IRL algorithms need to repeatedly compute the optimal policy for different reward functions. This paper proposes a new IRL approach that allows to recover the reward function without the need of solving any “direct” RL pr...

متن کامل

Inverse Reinforcement Learning with PI2

We present an algorithm that recovers an unknown cost function from expert-demonstrated trajectories in continuous space. We assume that the cost function is a weighted linear combination of features, and we are able to learn weights that result in a cost function under which the expert demonstrated trajectories are optimal. Unlike previous approaches [1], [2], our algorithm does not require re...

متن کامل

Reinforcement Learning with Policy Constraints

This paper addresses the problem of knowledge transfer in lifelong reinforcement learning. It proposes an algorithm which learns policy constraints, i.e., rules that characterize action selection in entire families of reinforcement learning tasks. Once learned, policy constraints are used to bias learning in future, similar reinforcement learning tasks. The appropriateness of the algorithm is d...

متن کامل

Repeated Inverse Reinforcement Learning

We introduce a novel repeated Inverse Reinforcement Learning problem: the agent has to act on behalf of a human in a sequence of tasks and wishes to minimize the number of tasks that it surprises the human by acting suboptimally with respect to how the human would have acted. Each time the human is surprised, the agent is provided a demonstration of the desired behavior by the human. We formali...

متن کامل

Bayesian Inverse Reinforcement Learning

Inverse Reinforcement Learning (IRL) is the problem of learning the reward function underlying a Markov Decision Process given the dynamics of the system and the behaviour of an expert. IRL is motivated by situations where knowledge of the rewards is a goal by itself (as in preference elicitation) and by the task of apprenticeship learning (learning policies from an expert). In this paper we sh...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i11.17141